Classifier model command injection detection in tool calls#6599
Classifier model command injection detection in tool calls#6599dorien-koelemeijer merged 15 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds BERT-based command injection detection to complement the existing prompt injection detection system. The implementation introduces a dual-classifier architecture that can analyze both tool call commands (for command injection) and conversation history (for prompt injection) independently.
Changes:
- Added
ClassifierTypeenum to distinguish between command and prompt classifiers - Refactored UI to extract
ClassifierEndpointInputscomponent and added command classifier settings - Extended
ClassificationClientwithfrom_model_typemethod to support model type-based lookups - Modified security alert messages to remove confidence percentage and add command previews
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| ui/desktop/src/components/settings/security/SecurityToggle.tsx | Refactored endpoint input UI into reusable component; added command classifier configuration section with toggle and endpoint inputs |
| crates/goose/src/security/scanner.rs | Introduced dual-classifier architecture with separate command and prompt classifiers; updated explanation builder to include command previews; modified logging messages |
| crates/goose/src/security/classification_client.rs | Added model_type field to ModelEndpointInfo; implemented from_model_type method for type-based model lookup; removed detailed error context and logging statements |
| crates/goose/src/security/security_inspector.rs | Simplified security alert message format by removing confidence percentage display |
| pub fn with_ml_detection() -> Result<Self> { | ||
| let classifier_client = Self::create_classifier_from_config()?; | ||
| let command_classifier = Self::create_classifier(ClassifierType::Command).ok(); | ||
| let prompt_classifier = Self::create_classifier(ClassifierType::Prompt).ok(); | ||
|
|
||
| if command_classifier.is_none() && prompt_classifier.is_none() { | ||
| anyhow::bail!("ML detection enabled but no classifiers could be initialized"); | ||
| } | ||
|
|
||
| Ok(Self { | ||
| pattern_matcher: PatternMatcher::new(), | ||
| classifier_client: Some(classifier_client), | ||
| command_classifier, | ||
| prompt_classifier, | ||
| }) | ||
| } |
There was a problem hiding this comment.
The command classifier functionality lacks test coverage. Consider adding tests that verify: 1) command classifier is used when enabled and available, 2) falls back to pattern matching when command classifier is not available, 3) the dual-classifier architecture works correctly with both classifiers enabled.
| pub fn from_model_type(model_type: &str, timeout_ms: Option<u64>) -> Result<Self> { | ||
| let mapping = serde_json::from_str::<ModelMappingConfig>( | ||
| &std::env::var("SECURITY_ML_MODEL_MAPPING") | ||
| .context("SECURITY_ML_MODEL_MAPPING environment variable not set")?, | ||
| ) | ||
| .context("Failed to parse SECURITY_ML_MODEL_MAPPING JSON")?; | ||
|
|
||
| let (_, model_info) = mapping | ||
| .models | ||
| .iter() | ||
| .find(|(_, info)| info.model_type.as_deref() == Some(model_type)) | ||
| .context(format!( | ||
| "No model with type '{}' found in SECURITY_ML_MODEL_MAPPING", | ||
| model_type | ||
| ))?; | ||
|
|
||
| Self::new( | ||
| model_info.endpoint.clone(), | ||
| timeout_ms, | ||
| None, | ||
| Some(model_info.extra_params.clone()), | ||
| ) | ||
| } |
There was a problem hiding this comment.
The new from_model_type method lacks test coverage. Consider adding tests for: 1) successfully finding a model by type, 2) handling cases where no model with the specified type exists, 3) handling invalid JSON in SECURITY_ML_MODEL_MAPPING.
8be782f to
8eade1b
Compare
dbc571b to
189c6f0
Compare
michaelneale
left a comment
There was a problem hiding this comment.
LGTM - didn't get a chance to hand test it. if you update to main soon it should hopefiully clear out the live failure.
…een prompt and command injection models
…command injection model
189c6f0 to
028effb
Compare
|
hi @dorien-koelemeijer checking if you are still planning on merging? |
…upport * origin/main: (79 commits) fix[format/openai]: return error on empty msg. (#6511) Fix: ElevenLabs API Key Not Persisting (#6557) Logging uplift for model training purposes (command injection model) [Small change] (#6330) fix(goose): only send agent-session-id when a session exists (#6657) BERT-based command injection detection in tool calls (#6599) chore: [CONTRIBUTING.md] add Hermit to instructions (#6518) fix: update Gemini context limits (#6536) Document r slash command (#6724) Upgrade GitHub Actions to latest versions (#6700) fix: Manual compaction does not update context window. (#6682) Removed the Acceptable Usage Policy (#6204) Document spellcheck toggle (#6721) fix: docs workflow cleanup and prevent cancellations (#6713) Docs: file bug directly (#6718) fix: dispatch ADD_ACTIVE_SESSION event before navigating from "View All" (#6679) Speed up Databricks provider init by removing fetch of supported models (#6616) fix: correct typos in documentation and Justfile (#6686) docs: frameDomains and baseUriDomains for mcp apps (#6684) docs: add Remotion video creation tutorial (#6675) docs: export recipe and copy yaml (#6680) ... # Conflicts: # ui/desktop/src/hooks/useChatStream.ts
…ovider * 'main' of github.com:block/goose: fix slash and @ keyboard navigation popover background color (#6550) fix[format/openai]: return error on empty msg. (#6511) Fix: ElevenLabs API Key Not Persisting (#6557) Logging uplift for model training purposes (command injection model) [Small change] (#6330) fix(goose): only send agent-session-id when a session exists (#6657) BERT-based command injection detection in tool calls (#6599) chore: [CONTRIBUTING.md] add Hermit to instructions (#6518) fix: update Gemini context limits (#6536) Document r slash command (#6724) Upgrade GitHub Actions to latest versions (#6700)
* 'main' of github.com:block/goose: Create default gooseignore file when missing (#6498) fix slash and @ keyboard navigation popover background color (#6550) fix[format/openai]: return error on empty msg. (#6511) Fix: ElevenLabs API Key Not Persisting (#6557) Logging uplift for model training purposes (command injection model) [Small change] (#6330) fix(goose): only send agent-session-id when a session exists (#6657) BERT-based command injection detection in tool calls (#6599) chore: [CONTRIBUTING.md] add Hermit to instructions (#6518) fix: update Gemini context limits (#6536) Document r slash command (#6724) Upgrade GitHub Actions to latest versions (#6700) fix: Manual compaction does not update context window. (#6682) Removed the Acceptable Usage Policy (#6204) Document spellcheck toggle (#6721) fix: docs workflow cleanup and prevent cancellations (#6713) Docs: file bug directly (#6718)
Summary
This PR adds BERT-based command injection detection to complement the existing prompt injection detection system. The implementation introduces a dual-classifier architecture that can analyse both tool call commands (for command injection) and conversation history (for prompt injection) independently.
Changes:
developer__shelltool commands are evaluated to prevent false positives from non-executable content (e.g.,rm -rfappearing in code examples or test cases)Type of Change
AI Assistance
Testing
Local testing using
just run-ui.🚨 Need to do some final testing after cleanup before merging (in progress)
Notes
🚨 This PR shouldn't be merged until the
SECURITY_ML_MODEL_MAPPINGvar in internal-releases has been updated (edit: this has now been merged, so should be all good).Screenshots/Demos (for UX changes)
Prompt injection detection settings in UI have been updated (new toggle added):